A probabilistic trajectory synthesis system for synthesising visual speech

نویسندگان

  • Barry-John Theobald
  • Nicholas Wilkinson
چکیده

We describe an unsupervised probabilistic approach for synthesising visual speech from audio. Acoustic features representing a training corpus are clustered and the probability density function (PDF) of each cluster is modelled as a Gaussian mixture model (GMM). A visual target in the form of a shortterm parameter trajectory is generated for each cluster. Synthesis involves combining the cluster targets based on the likelihood of novel acoustic feature vectors, then cross-blending neighbouring regions of the synthesised short-term trajectories. The advantage of our approach is coarticulation effects are explicitly captured by the mapping. The influence of cluster targets naturally increase and decrease with the likelihood of the acoustic feature vectors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion

High quality speech-to-lips conversion, investigated in this work, renders realistic lips movement (video) consistent with input speech (audio) without knowing its linguistic content. Instead of memoryless framebased conversion, we adopt maximum likelihood estimation of the visual parameter trajectories using an audio-visual joint Gaussian Mixture Model (GMM). We propose a minimum converted tra...

متن کامل

GPS Jamming Detection in UAV Navigation Using Visual Odometry and HOD Trajectory Descriptor

Auto-navigating of unmanned aerial vehicles (UAV) in the outdoor environment is performed by using the Global positioning system (GPS) receiver. The power of the GPS signal on the earth surface is very low. This can affect the performance of GPS receivers in the environments contaminated with the other source of radio frequency interference (RFI). GPS jamming and spoofing are the most serious a...

متن کامل

Introducing visual target cost within an acoustic-visual unit-selection speech synthesizer

In this paper, we present a method to take into account visual information during the selection process in an acoustic-visual synthesizer. The acoustic-visual speech synthesizer is based on the selection and concatenation of synchronous bimodal diphone units i.e., speech signal and 3D facial movements of the speaker’s face. The visual speech information is acquired using a stereovision techniqu...

متن کامل

Photo-real lips synthesis with trajectory-guided sample selection

In this paper, we propose an HMM trajectory-guided, real image sample concatenation approach to photo-real talking head synthesis. It renders a smooth and natural video of articulators in sync with given speech signals. An audio-visual database is used to train a statistical Hidden Markov Model (HMM) of lips movement first and the trained model is then used to generate a visual parameter trajec...

متن کامل

Synthesising Singing

This is a review of some work carried out over the last decades at the Speech Music Hearing Department, KTH, where the analysis-by-synthesis strategy was applied to singing. The origin of the work was a hardware synthesis machine combined with a control program, which was a modified version of a textto-speech conversion system. Two applications are described, one concerning vocal loudness varia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008